Towards Computational Efficiency of Next Generation Multimedia Systems

نویسنده

  • Muhammad Usman Karim Khan
چکیده

High throughput demands under complexityand power-efficiency has imposed numerous design challenges for the next generation multimedia systems. Multimedia (especially video) applications impose tight throughput constraints (e.g., frame resolutions beyond 1920×1080, at more than 30 FPS), which must be met by possibly resourceand battery-constrained underlying hardware. However, technology scaling in the nano-era has led to high transistor densities. On one hand, the technology scaling provides increased resources to the application designer, to enable high throughput multimedia processing. Contrarily, these technological advancements come with their associated challenges, like high power densities (Dark Silicon paradigm). Furthermore, high power densities lead to elevated on-chip temperatures that jeopardize the reliability of the multimedia system, like NBTI-induced aging. This suggests that a next generation multimedia system which consumes low power might not be able to fulfil the throughput constraint, while a multimedia system meeting its throughput constraints might be resourceand power-wise inefficient. These contradictions open new frontiers for exploring software and hardware level co-design and co-optimizations space. Since the complexity and power consumption of the multimedia system can be reduced at both the software and hardware level, therefore, several software and hardware factors (like varying workload characteristics, Thermal Design Power or TDP constraints, application-specific architectural optimizations and available hardware resources) play an important role for designing high complexity embedded multimedia systems. The state-of-the-art works, however, do not exploit the complete hardware-software design optimization space of advanced embedded multimedia systems under Dark Silicon constraints, to fully exploit the power-, complexityand resource-saving, and reliability improvement potential for long-term system deployment. The aim of this Ph.D. thesis to design efficient multimedia (specifically image/video) systems that are easily portable to programmable soft-cores, application-specific hardware platforms, and domain specific hardware accelerators, while providing power-efficiency and reliability. The key design novelty is to recognizing and mutually consider the hardware constraints and software/application-specific characteristics, and synergistically and objectively tuning software and hardware parameters. Since image/video processing workload are power hungry, therefore, this Ph.D. thesis targets to encompass multiple design aspects (complexity reduction, workload balancing, power reduction, aging optimization) in an integrated manner to improve power and reliability metrics. Moreover, this work builds software and hardware optimizations by analyzing the applications and hardware characteristics, and then leveraging the applicationand content-knowledge for design and management of next generation multimedia systems from both power and reliability perspective. The design-focus of our approaches and strategies is a multi-/many-core system, with on-chip hardware co-processors and accelerators. A brief summary of the contributions by this thesis are given below. Power-Efficient Software Layer: For the multimedia systems, the software layer determines system parameters (number of cores used by the parallel running application(s), amount of tasks offloaded to hardware accelerators and high-end servers, voltage-frequency settings of the cores, power-gating control etc.) and adapts them by using feedback from the hardware layer. The goal is to increase the throughput-per-watt metric of the multimedia system. A synopsis of the software level approaches proposed in this thesis is given below. Parallelization and Workload Balancing: To avoid computational hotspots and utilize the underlying hardware, parallelization and workload balancing approaches presented in this thesis target power reduction while meeting the throughput demands of the video applications. At runtime, a multi-objective optimization is performed which divides the workload in either uniform or in a non-uniform manner among the cores, and tunes the application parameters. On homogeneous cores, the proposed approaches result in up to ~19% power savings compared to the state-of-the-art approach [1], while additional ~7.8% power savings are obtained with non-uniform load distribution. Up to 64% throughput-per-watt improvement is obtained compared to [2] while using heterogeneous computing nodes. Resource Budgeting: While considering the throughput demands of multiple, multithreaded applications, the resource budgeting approach presented in this thesis divides the available cores and the TDP among these applications. The resources allocated to the applications are adapted at runtime, and this improves the throughput of the system from ~1.18× to ~1.45× compared to [3], under varying Dark Silicon scenarios. Computation Offloading: At the software side, the video contentand throughput constraints-driven offloading mechanisms are developed to offload computations to a high-end server, which achieves considerable energy savings (~20%) compared to [4]. Power-Efficient Hardware Layer: The hardware layer supports video I/O, communication among (possibly heterogeneous) compute nodes, power-efficient video memory design and aging-aware optimizations. Further, this layer exposes some of its functionality to the software layer (for approaches like software-guided frequency tuning of the cores, power gating and feedback of statistics to the software). A brief summary of the architectural contributions of this thesis are given below. Video I/O and Communications: To develop high throughput applications, video I/O architectures and custom hardware for communication among computing nodes proposed by this thesis targets communication efficiency at reduced hardware cost. Hardware Accelerator Sharing/Scheduling: To offload workload from soft-cores to the shared hardware accelerator, or, to share the hardware accelerator for processing multiple tasks in a round-robin fashion, hardware accelerator sharing and scheduling approaches are presented such that the throughput of all the soft-cores is met, hardware accelerator is fully utilized and the power consumption of the system is minimized. Similarly, efficient hardware accelerators are designed which can provide high throughput and power-efficiency (by selectively clock-gating parts of the accelerator) while meeting the computational constraints of the video system. For the H.264/AVC encoding loop, the proposed approach achieves ~4.14× hardware savings compared to [5], while the proposed edge detection mechanism (for efficient mode computations) results in ~1.9× area savings compared to [6]. Memory Subsystem Design: A hybrid memory architecture, consisting of sectored non-volatile memory (MRAM) based frame buffers and SRAM FIFOs, achieves high power savings at minimal latency penalty, by adaptively turning ON the normally OFF MRAM sectors. Moreover, the on-chip SRAM aging resiliency approach presented here exploits video content-properties to reduce the aging rate of 6T SRAM cells, which store the data bits. A controller is proposed which adaptively performs aging-aware, online data adaptation at different spatial and temporal granularity. The above mentioned software and hardware approaches have resulted in several open-source contributions, which are available for download and can be found in the free software pool of our lab’s (Chair for Embedded Systems, CES) webpage: http://ces.itec.kit.edu/. In a nutshell, software and hardware properties are synergistically evaluated to determine the degree of parallelism, task offloading and resource budgeting. Moreover, the proposed approaches result in tunable, software guided frequency and gating control of the hardware using feedback from the hardware, in order to lower the power consumption of the system. Further, the proposed video system’s hardware layer consists of novel accelerator design methodology, and powerand aging-efficient memory subsystem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Radio Access Schemes and Technologies for Next-Generation Network

Multimedia services such as Web browsing, video on demand, and IP telephone delivered to the office and home by wired network systems such as optical fiber systems and ADSL systems are now very popular. This popularity increases the demand for more powerful technologies to provide radio access such as high data rate, QoS controlling. The 3rd generation wireless system already has the ability to...

متن کامل

Seamless Multimedia Services Over All-IP Based Infrastructures: The EVOLUTE Approach

The increasing amount of roaming Internet users in combination with the evolution of IP-based applications has created a strong demand for wide-area, broadband access to a number of IP multimedia services. Wireless LANs can complement the next-generation cellular networks, by offering a cost-efficient, wireless broadband data solution for hot spot areas. By combining the wide coverage of next-g...

متن کامل

Challenges for Broadband Wireless Technology

Convergence of mobile communications, computing and Internet is on the way. This will be the driving force towards a wireless multimedia society in 21st century. Unfortunately, since the present mobile communication systems (often referred to as 2G systems) are optimized to real-time voice services, they have quite limited capabilities in providing broadband multimedia services because of their...

متن کامل

Performance of Multi-beam Satellite Systems With A New Bandwidth Sharing Algorithm

An efficient resource allocation is important to guarantee the best performance with a fair distribution of multi-beam satellite capacity to provide satellite multimedia and broadcasting services. In this way, available bandwidth and capacity problems in new satellite system likes Multi-Input-Multi-Output (MIMO), exploring new techniques for enhancing spectral efficiency in satellite communicat...

متن کامل

Resource Allocation Strategies Based on Ant Colony Optimization for Next Generation Wireless Systems

Next generation wireless systems aim at providing an high spectral efficiency, with indoor traffic up to 1 Gbit/s and outdoor traffic up to 100 Mbit/s, offering, at the same time, several multimedia applications, each one marked by its own QoS. In such a dynamic situation, where the number of users, the type of applications and the channel conditions of each user vary rapidly, raises the proble...

متن کامل

A new architecture for advanced telematics services

Advances in Information and Communication Technologies have helped spread the use of telematics, together with a fall in price and a rise in reliability of the systems. The application of such technologies to the mobility area can support the growing requirements for efficiency and safety of transportation systems by means of the development of new services and the strengthening of the existing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015